76 research outputs found
Improved Smoothed Analysis of the k-Means Method
The k-means method is a widely used clustering algorithm. One of its
distinguished features is its speed in practice. Its worst-case running-time,
however, is exponential, leaving a gap between practical and theoretical
performance. Arthur and Vassilvitskii (FOCS 2006) aimed at closing this gap,
and they proved a bound of \poly(n^k, \sigma^{-1}) on the smoothed
running-time of the k-means method, where n is the number of data points and
is the standard deviation of the Gaussian perturbation. This bound,
though better than the worst-case bound, is still much larger than the
running-time observed in practice.
We improve the smoothed analysis of the k-means method by showing two upper
bounds on the expected running-time of k-means. First, we prove that the
expected running-time is bounded by a polynomial in and
. Second, we prove an upper bound of k^{kd} \cdot \poly(n,
\sigma^{-1}), where d is the dimension of the data space. The polynomial is
independent of k and d, and we obtain a polynomial bound for the expected
running-time for .
Finally, we show that k-means runs in smoothed polynomial time for
one-dimensional instances.Comment: To be presented at the 20th ACM-SIAM Symposium on Discrete Algorithms
(SODA 2009
The Alternating Stock Size Problem and the Gasoline Puzzle
Given a set S of integers whose sum is zero, consider the problem of finding
a permutation of these integers such that: (i) all prefix sums of the ordering
are nonnegative, and (ii) the maximum value of a prefix sum is minimized.
Kellerer et al. referred to this problem as the "Stock Size Problem" and showed
that it can be approximated to within 3/2. They also showed that an
approximation ratio of 2 can be achieved via several simple algorithms.
We consider a related problem, which we call the "Alternating Stock Size
Problem", where the number of positive and negative integers in the input set S
are equal. The problem is the same as above, but we are additionally required
to alternate the positive and negative numbers in the output ordering. This
problem also has several simple 2-approximations. We show that it can be
approximated to within 1.79.
Then we show that this problem is closely related to an optimization version
of the gasoline puzzle due to Lov\'asz, in which we want to minimize the size
of the gas tank necessary to go around the track. We present a 2-approximation
for this problem, using a natural linear programming relaxation whose feasible
solutions are doubly stochastic matrices. Our novel rounding algorithm is based
on a transformation that yields another doubly stochastic matrix with special
properties, from which we can extract a suitable permutation
Worst Case and Probabilistic Analysis of the 2-Opt Algorithm for the TSP
2-Opt is probably the most basic local search heuristic for the TSP. This
heuristic achieves amazingly good results on real world Euclidean instances
both with respect to running time and approximation ratio. There are numerous
experimental studies on the performance of 2-Opt. However, the theoretical
knowledge about this heuristic is still very limited. Not even its worst case
running time on 2-dimensional Euclidean instances was known so far. We clarify
this issue by presenting, for every , a family of
instances on which 2-Opt can take an exponential number of steps.
Previous probabilistic analyses were restricted to instances in which
points are placed uniformly at random in the unit square . We consider
a more advanced model in which the points can be placed independently according
to general distributions on , for an arbitrary . In
particular, we allow different distributions for different points. We study the
expected number of local improvements in terms of the number of points and
the maximal density of the probability distributions. We show an upper
bound on the expected length of any 2-Opt improvement path of
. When starting with an initial tour
computed by an insertion heuristic, the upper bound on the expected number of
steps improves even to . If the
distances are measured according to the Manhattan metric, then the expected
number of steps is bounded by . In addition, we
prove an upper bound of on the expected approximation
factor with respect to all metrics.
Let us remark that our probabilistic analysis covers as special cases the
uniform input model with and a smoothed analysis with Gaussian
perturbations of standard deviation with .Comment: An extended abstract of this work has appeared in the Proc. of the
18th ACM-SIAM Symposium on Discrete Algorithms. The results of this extended
abstract have been split into two articles (Algorithmica 2014) and (ACM
Transactions on Algorithms 2016). This report is an updated version of the
first journal article, in which two minor errors in the proofs of Lemma 8 and
Lemma 9 have been correcte
- …